Search CORE

24 research outputs found

Counting hypergraph matchings up to uniqueness threshold

Author: Song Renjie
Yin Yitong
Zhao Jinman
Publication venue
Publication date: 01/01/2016
Field of study

We study the problem of approximately counting matchings in hypergraphs of bounded maximum degree and maximum size of hyperedges. With an activity parameter

\lambda

, each matching

M

is assigned a weight

\lambda^{|M|}

. The counting problem is formulated as computing a partition function that gives the sum of the weights of all matchings in a hypergraph. This problem unifies two extensively studied statistical physics models in approximate counting: the hardcore model (graph independent sets) and the monomer-dimer model (graph matchings). For this model, the critical activity

\lambda_c= \frac{d^d}{k (d-1)^{d+1}}

is the threshold for the uniqueness of Gibbs measures on the infinite

(d+1)

-uniform

(k+1)

-regular hypertree. Consider hypergraphs of maximum degree at most

k+1

and maximum size of hyperedges at most

d+1

. We show that when

\lambda < \lambda_c

, there is an FPTAS for computing the partition function; and when

\lambda = \lambda_c

, there is a PTAS for computing the log-partition function. These algorithms are based on the decay of correlation (strong spatial mixing) property of Gibbs distributions. When

\lambda > 2\lambda_c

, there is no PRAS for the partition function or the log-partition function unless NP

=

RP. Towards obtaining a sharp transition of computational complexity of approximate counting, we study the local convergence from a sequence of finite hypergraphs to the infinite lattice with specified symmetry. We show a surprising connection between the local convergence and the reversibility of a natural random walk. This leads us to a barrier for the hardness result: The non-uniqueness of infinite Gibbs measure is not realizable by any finite gadgets

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Code Prediction by Feeding Trees to Transformers

Author: Chandra Satish
Kim Seohyun
Tian Yuchi
Zhao Jinman
Publication venue
Publication date: 02/07/2020
Field of study

We advance the state-of-the-art in the accuracy of code prediction (next token prediction) used in autocomplete systems. First, we report that using the recently proposed Transformer architecture even out-of-the-box outperforms previous neural and non-neural systems for code prediction. We then show that by making the Transformer architecture aware of the syntactic structure of code, we further increase the margin by which a Transformer-based system outperforms previous systems. With this, it outperforms the accuracy of an RNN-based system (similar to Hellendoorn et al. 2018) by 18.3\%, the Deep3 system (Raychev et al 2016) by 14.1\%, and an adaptation of Code2Seq (Alon et al., 2018) for code prediction by 14.4\%. We present in the paper several ways of communicating the code structure to the Transformer, which is fundamentally built for processing sequence data. We provide a comprehensive experimental evaluation of our proposal, along with alternative design choices, on a standard Python dataset, as well as on a Facebook internal Python corpus. Our code and data preparation pipeline will be available in open source

arXiv.org e-Print Archive

Better Context Makes Better Code Language Models: A Case Study on Function Call Argument Completion

Author: Karypis George
Lausen Leonard
Pei Hengzhi
Zha Sheng
Zhao Jinman
Publication venue
Publication date: 01/06/2023
Field of study

Pretrained code language models have enabled great progress towards program synthesis. However, common approaches only consider in-file local context and thus miss information and constraints imposed by other parts of the codebase and its external dependencies. Existing code completion benchmarks also lack such context. To resolve these restrictions we curate a new dataset of permissively licensed Python packages that includes full projects and their dependencies and provide tools to extract non-local information with the help of program analyzers. We then focus on the task of function call argument completion which requires predicting the arguments to function calls. We show that existing code completion models do not yield good results on our completion task. To better solve this task, we query a program analyzer for information relevant to a given function call, and consider ways to provide the analyzer results to different code completion models during inference and training. Our experiments show that providing access to the function implementation and function usages greatly improves the argument completion performance. Our ablation study provides further insights on how different types of information available from the program analyzer and different ways of incorporating the information affect the model performance.Comment: 12 pages. Accepted to AAAI 202

arXiv.org e-Print Archive

Large Language Models of Code Fail at Completing Code with Potential Bugs

Author: Dinh Tuan
Karypis George
Lausen Leonard
Negrinho Renato
Tan Samson
Zha Sheng
Zhao Jinman
Publication venue
Publication date: 06/06/2023
Field of study

Large language models of code (Code-LLMs) have recently brought tremendous advances to code completion, a fundamental feature of programming assistance and code intelligence. However, most existing works ignore the possible presence of bugs in the code context for generation, which are inevitable in software development. Therefore, we introduce and study the buggy-code completion problem, inspired by the realistic scenario of real-time code suggestion where the code context contains potential bugs -- anti-patterns that can become bugs in the completed program. To systematically study the task, we introduce two datasets: one with synthetic bugs derived from semantics-altering operator changes (buggy-HumanEval) and one with realistic bugs derived from user submissions to coding problems (buggy-FixEval). We find that the presence of potential bugs significantly degrades the generation performance of the high-performing Code-LLMs. For instance, the passing rates of CodeGen-2B-mono on test cases of buggy-HumanEval drop more than 50% given a single potential bug in the context. Finally, we investigate several post-hoc methods for mitigating the adverse effect of potential bugs and find that there remains a large gap in post-mitigation performance.Comment: 25 page

arXiv.org e-Print Archive

Structural Realization with GGNNs

Author: Ling Huan
Penn Gerald
Zhao Jinman
Publication venue: 'University of Toronto Medical Journal'
Publication date: 01/01/2021
Field of study

To appear in Proceedings of the 15th Workshop on Graph-Based Natural Language Processing (TextGraphs-15), 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics.In this paper, we define an abstract task called structural realization that generates words given a prefix of words and a partial representation of a parse tree. We also present a method for solving instances of this task using a Gated Graph Neural Network (GGNN). We evaluate it with standard accuracy measures, as well as with respect to perplexity, in which its comparison to previous work on language modelling serves to quantify the information added to a lexical selection task by the presence of syntactic knowledge. That the addition of parse-tree-internal nodes to this neural model should improve the model, with respect both to accuracy and to more conventional measures such as perplexity, may seem unsurprising, but previous attempts have not met with nearly as much success. We have also learned that transverse links through the parse tree compromise the model's accuracy at generating adjectival and nominal parts of speech

University of Toronto Research Repository

Effects of Neighborhood Competition and Stand Structure on the Productivity of Pure and Mixed <i>Larix principis-rupprechtii</i> Forests

Author: Jing Zhang
Jinman Zhao
Ruiming Cheng
Zhaoxuan Ge
Zhidong Zhang
Publication venue: MDPI AG
Publication date: 01/08/2022
Field of study

Understanding the factors influencing tree productivity is central to forest ecology. However, the relative contributions of neighborhood interactions, tree species diversity, and tree size to larch (Larix principis-rupprechtii) productivity require further study. Three plots in the Guandi Mountains, Shanxi Province, were set up for each of the following forest types: natural pure larch forest (PL), mixed larch and birch (Betula platyphylla) forest (LB), and mixed larch and spruce (Picea asperata) forest (LS). Based on the tree size-stratified sampling method, a total of 318 tree core samples were collected. A linear mixed model was used to analyze the effects of tree size, dominance, mixing, and neighborhood competition on larch productivity. Birch and spruce promoted larch growth at the stand and individual tree levels, and birch exhibited a more significant facilitating effect. Intraspecific competition was the main factor affecting larch growth. When the intensity of competition among trees was low, the basal area increment (BAI) of larch in the mixed forests was higher than that in the pure forest. However, with increasing competition, the BAI of larch was lower in the mixed forests than in the pure forest. Factors including tree size, dominance, and mingling were positively correlated with the BAI of larch. With increasing tree size, the BAI of larch was higher in the mixed forests than in the pure forest and higher in LB than in LS. When the dominance was less than 0.5, the BAI of larch was higher in the pure forest than in the mixed forests and higher in LS than in LB. With increasing dominance, the BAI of larch was higher in the mixed forests than in the pure forest. The BAI of larch increased with an increasing mixing degree in the mixed forests, and the increasing trend of BAI was larger in LB than in LS. Larch productivity was influenced mainly by neighborhood interactions and stand structure. Improving neighborhood tree diversity and increasing the large tree proportion and dominance of larch will be helpful for improving larch productivity in mixed forests

Directory of Open Access Journals

Optical-lattice-like waveguide structures in Ti:Sapphire by femtosecond laser inscription for beam splitting

Author: Chen Feng
Lv Jinman
Ren Yingying
Romero Carolina
Vázquez de Aldana Javier R.
Zhang Limu
Zhao Yuefeng
Publication venue: 'The Optical Society'
Publication date: 01/06/2017
Field of study

In this work, we report on the fabrication of deeply embedded optical-lattice-like structures in a Ti:Sapphire crystal by applying femtosecond laser inscription (FLI) to implement two-dimensional (2D) one-to-two and three-dimensional (3D) one-to-four beam splitting. Such a family of photonic microstructures is characterized at near-infrared both experimentally and numerically, showing excellent capability of simultaneous light confinement and beam tailoring at two orthogonal polarizations. The confocal micro-Raman image of the obtained structure reveals that the optical properties of the substrate have been well-preserved in the waveguide’s active volumes. Our results pave a way to construct complex integrated waveguide splitters in Ti:Sapphire crystals by using FLI for photonic applications.This work is supported by the National Natural Science Foundation of China (No.11404194 and No. 11404196). Authors acknowledge support from Junta de Castilla y León (Project SA046U16) and MINECO (FIS2015-71933-REDT). Authors would like to thank Prof. Ajoy. K. Kar and Dr. Mark D. Mackenzie from Heriot-Watt University for their help on µ-Raman intensity measurement

Gestion del Repositorio Documental de la Universidad de Salamanca